University of Michigan – Guava (Graph Understanding Algebra for Visual Analytics)

VAST 2009 Challenge
Challenge 2 - Social Network and Geospatial

Authors and Affiliations:

Hao Zhou, University of Michigan, zhouhao@umich.edu  [PRIMARY contact]
Anna Shaverdian, University of Michigan, annaas@umich.edu [PRIMARY contact]
H.V. Jagadish, University of Michigan, jag@umich.edu [Faculty advisor]

George Michailidis, University of Michigan, gmichail@umich.edu [Faculty advisor]

Tool(s):

Our analysis is based on the Visual Analytic Algebra presented in [1].  This algebra provides a structured framework for visual analytics and is based on a number of operators, including selection and aggregation.  We found that Cytoscape [2], a popular open source tool for biologists to visualize interaction networks, offers most of the desired functionality required by our proposed algebra to guide us in the identification of the underlying social structure.

 

Further, a number of the available plugins, together with the ability for one to implement in Java new ones, provide us with the required enhanced functionality. Although Cytoscape constitutes the main tool, a number of additional tools and methods, such as Java, R and Microsoft Office, were also employed for validation purposes and further analysis of the data set.

.

 

 

 

 

 

Video:

 

Video

 

ANSWERS:


MC2.1: Which of the two social structures, A or B, most closely match the scenario you have identified in the data?

A

 


MC2.2:  Provide the social network structure you have identified as a tab delimitated file. It should contain the employee, one or more handler, any middle folks, and the localized leader with their international contacts. What are the Flitter names of the persons involved? Please identify only key connections (not all single links for example) as well as any other nodes related to the scenario (if any) you may have discovered that were not described in the two scenarios A and B above.  

Flitter.txt


MC2.3:  Characterize the difference between your social network and the closest social structure you selected (A or B). If you include extra nodes please explain how they fit in to your

 

The social network we identified above perfectly matches hypothesis A.

 

We begin by exploring which hypothesis is more likely.  This task is trying to identify a subgraph.  There exist plugins to identify subgraphs, but there are several uncertainties existing in each hypothesis.  These uncertainties require a more investigative series of eliminations to identify the criminal employee and his network. 

 

To add to our comprehension of the network, we use the Network Analysis plugin to compute several network statistics.  Network Analysis saves the node metrics as node attributes.  We study the node degree distribution chart to determine how to proceed in identifying the criminal network.  Based on this chart, shown in Figure 1, if we start by filtering employees we might find a solution faster because there are fewer nodes at this degree range than the middle men degree range. 

 

Slide1

Figure 1:  Hairball picture of network.  Showing several Cytoscape features.

 

First, we attempt to match social structure A.   We select all nodes with degree around 40 to collect our initial employee suspect list.  Using the ESP Plugin, we enter in its simple query language: “Degree: {35 TO 45}” to select and display these nodes in Cytoscape’s Data Panel.  In the Data Panel, we specify which attributes to display and order the nodes by a selected attribute.  At this stage there are too many possible employees to visually identify the social structure.  So we continue to filter this list by applying more of social structure A’s information.

 

Next we identify the handlers: suspect employee neighbors with degrees between 30 to 40.  In Cytoscape this task requires three operations.  First we select the neighbors of the suspect employees with a plugin we wrote, Neighbor Node Select.  We save this list of nodes into its own network view.  In the Control Panel, Cytoscape saves and lists several networks within one Cytoscape session.  This history mechanism is useful for returning to different manipulations of a network.  Next, we use ESP to select nodes which have a degree 30 to 40.  We save this list of nodes as our initial handlers.

 

We refine the current employee and handler lists by filtering employees with less than three handlers.  To do this we create a sub network of the current employee and handler nodes from the original graph by selecting these nodes from their saved lists.  Using this sub network, we filter employees which have less than degree 3.  This operation results in a refined suspect employees and handlers list.  But there are still too many nodes to visualize the crime network.  So we continue filtering.

 

To complete social structure A, we find the possible Boris list.  We select handler neighbors with degree 4 to 5.  We see that only degree 5 Borises exist.  From the hypothesis description, we know 3 handlers have a Boris in common.  We use this to refine the current employee, handler, and Boris lists.  We have reduced the number of possible networks but we continue to apply more information from social structure A to pick a criminal network with high confidence.

 

The final elimination we apply is that Boris must have a neighbor with degree greater than 100: the fearless leader.  We can eliminate some Borises this way by ignoring the navy blue nodes without a white node neighbor.  We save the final Employee, Boris, Handler, and Fearless leader lists.  From the original graph, we select these nodes and produce a new sub network.  We use our saved lists to extract the sub network including the employee, Boris, handlers, and fearless leader nodes.  This network represents all possible criminal employee networks with social structure A. 

 

At this stage we can finally observe the most likely social structure A network from the limited possibilities.   For example, we know handlers do not communicate; therefore, node 171 is unlikely the criminal employee.  This process helps us produce a solution, centered at employee node 100, which perfectly fits social structure A.  We complete the network by identifying the International Contacts of the Leader.   The fearless leader has a broad network of 14 international contacts and is located in a larger city.  This extra information adds to our confidence that node 100 is the criminal employee.    Figure 2 shows a few of the finals networks we analyze.

 

Slide2

Figure 2:  Possible Borises and different networks we analyzed.  

 

The other possibilities available might be more networks that the Embassy may wish to investigate.  We discover other sub-networks similar to social structure A and B.  But, we have lower confidence in these sub networks because of their conflicts with the descriptions: incorrect degrees or locations or handlers communications. 

 

For example, the following sub-network is similar to hypothesis A, except Boris here has a degree of 6:

 

ID

Role

Filter Name

142

Employee

@lafouge

38

Handler

@krintz

101

Handler

@lonning

318

Handler

@formenti

4980  (degree = 6)

Middleman

@rosch

11

Fearless Leader

@cornell

 

Also, there exist sub-networks similar to hypothesis B with the conflict being the middlemen degree.  Hypothesis B states the middlemen have a degree within 3-4.  Here we show one example:

 

ID

Role

Filter Name

224

Employee

@usdin

31

Handler 1

@degtyarev

19

Handler 2

@marrevee

79

Handler 3

@terekhov

4769

Middleman to Handler 1

@brockman

3970

Middleman to Handler 2

@rego

2625

Middleman to Handler 3

@hennebert

5

Fearless Leader

@jantke

 

 


MC2.4:  How is your hypothesis about the social structure in Part 1 supported by the city locations of Flovania? What part(s), if any, did the role of geographical information play in the social network of part one? 

Sub network employee 100 fits in the city location structure of the suspected network well. Both employee and handlers are in the large city. The middleman is in the nearby middle sized city. The fearless leader is also in the middle sized city with many international connections.  Figure 3 shows the location distribution pie charts created in Exel to verify the members geographical information.

 

 

Slide3

Figure 3:  Location distribution of crime A network created in Excel.

 


MC2.5:  In general, how are the Flitter users dispersed throughout the cities of this challenge? Which of the surrounding countries may have ties to this criminal operation?  Why might some be of more significant concern than others?

Over half of the flitter users are in the large cities of Flovania.  Only 7% of all the flitter users are not in Flovania.  All three other neighbor countries may have ties to criminal operation.  However, Othello has a slightly stronger connectivity.  The percentage of connections between foreign cities and suspected members are higher than average for flitter users.  Specifically, the three handlers are highly connected to at least one of the three surrounding countries.

 

References:

[1] A. Shaverdian, H. Zhou, H.V. Jagadish, G. Michailidis.  “Algebraic Visual Analysis: The Catalano Phone Call Data Set Case Study.”  VAKD 09.

 

[2]The Cytoscape Collaboration. Cytoscape Users Manual. The Cytoscape Collaboration, Institute for Systems Biology and University of California San Diego, 2006.